NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Provenance Tracing in Network Diffusion Algorithms

Tasnina, Nure; Crovella, Mark; Kasif, Simon; Murali, T M (January 2026, Proceedings of the Pacific Symposium on Biocomputing)

We propose a novel strategy for provenance tracing in random walk-based network diffusion algorithms, a problem that has been surprisingly overlooked in spite of the widespread use of diffusion algorithms in biological applications. Our path-based approach enables ranking paths by the magnitude of their contribution to each node’s score, offering insight into how information propagates through a network. Building on this capability, we introduce two quantitative measures: (i) path-based effective diffusion, which evaluates how well a diffusion algorithm leverages the full topology of a network, and (ii) diffusion betweenness, which quantifies a node’s importance in propagating scores. We applied our framework to SARS-CoV-2 protein interactors and human PPI networks. Provenance tracing of the Regularized Laplacian and Random Walk with Restart algorithms revealed that a substantial amount of a node’s score is contributed via multi-edge paths, demonstrating that diffusion algorithms exploit the non-local structure of the network. Analysis of diffusion betweenness identified proteins playing a critical role in score propagation; proteins with high diffusion betweenness are enriched with essential human genes and interactors of other viruses, supporting the biological interpretability of the metric. Finally, in a signaling network composed of causal interactions between human proteins, the top contributing paths showed strong overlap with COVID-19-related pathways. These results suggest that our path-based framework offers valuable insight into diffusion algorithms and can serve as a powerful tool for interpreting diffusion scores in a biologically meaningful context, complementing existing module- ornode-centric approaches in systems biology. The code is publicly available at https:// github.com/n-tasnina/provenance-tracing.git under the GNU General Public License v3.0.
more » « less
Free, publicly-accessible full text available January 3, 2027
Privacy-Preserving Machine Learning on Web Browsing for Public Opinion

Buxbaum, Sam; Tassis, Lucas M; Boschelli, Lucas; Comarela, Giovanni; Varia, Mayank; Crovella, Mark; Christenson, Dino P (December 2025, Springer)

We present a real-world deployment of secure multiparty computation to predict political preference from private web browsing data. To estimate aggregate preferences for the 2024 U.S. presidential election candidates, we collect and analyze secret-shared data from nearly 8000 users from August 2024 through February 2025, with over 2000 daily active users sustained throughout the bulk of the survey. The use of MPC allows us to compute over sensitive web browsing data that users would otherwise be more hesitant to provide. We collect data using a custom-built Chrome browser extension and perform our analysis using the CrypTen MPC library. To our knowledge, we provide the first implementation under MPC of a model for the learning from label proportions (LLP) problem in machine learning, which allows us to train on unlabeled web browsing data using publicly available polling and election results as the ground truth.
more » « less
Free, publicly-accessible full text available December 4, 2026
Toward a Representative DNS Data Corpus: A Longitudinal Comparison of Collection Methods

https://doi.org/10.23919/TMA66427.2025.11096967

Kranig, Calvin; Pauley, Eric; Wung, Wei-Shiang; Barford, Paul; Crovella, Mark; Sommers, Joel (June 2025, IEEE)

Free, publicly-accessible full text available June 10, 2026
Squatspotting: Towards the Systematic Measurement of Typosquatting Techniques

Wung, Wei-Shiang; Kranig, Calvin; Pauley, Eric; Barford, Paul; Crovella, Mark; Sommers, Joel (May 2025, IFIP)

Free, publicly-accessible full text available May 26, 2026
Squatspotting: Towards the Systematic Measurement of Typosquatting Techniques

Wung, Wei Shiang; Kranig, Calvin; Pauley, Eric; Barford, Paul; Crovella, Mark; Sommers, Joel (May 2025, IFIP)

Typosquatting—the practice of registering a domain name similar to another, usually well-known, domain name—is typically intended to drive traffic to a website for malicious or profit- driven purposes. In this paper we assess the current state of typosquatting, both broadly (across a wide variety of techniques) and deeply (using an extensive and novel dataset). Our breadth derives from the application of eight different candidate-generation techniques to a selection of the most popular domain names. Our depth derives from probing the resulting name set via a unique corpus comprising over 3.3B Domain Name System (DNS) records. We find that over 2.3M potential typosquatting names have been registered that resolve to an IP address. We then assess those names using a framework focused on identifying the intent of the domain from the perspectives of DNS and webpage clustering. Using the DNS information, HTTP responses, and Google SafeBrowsing, we classify the candidate typosquatting names as resolved to private IP, malicious, defensive, parked, legitimate, or unknown intents. Our findings provide the largest-scale and most-comprehensive perspective to date on typosquatting, exposing potential risks to users. Further, our methodology provides a blueprint for tracking and classifying typosquatting on an ongoing basis.
more » « less
Free, publicly-accessible full text available May 26, 2026
Squatspotting: Towards the Systematic Measurement of Typosquatting Techniques

Wung, Wei Shiang; Kranig, Calvin; Pauley, Eric; Barford, Paul; Crovella, Mark; Sommers, Joel (May 2025, IFIP)

Typosquatting—the practice of registering a domain name similar to another, usually well-known, domain name—is typically intended to drive traffic to a website for malicious or profitdriven purposes. In this paper we assess the current state of typosquatting, both broadly (across a wide variety of techniques) and deeply (using an extensive and novel dataset). Our breadth derives from the application of eight different candidate-generation techniques to a selection of the most popular domain names. Our depth derives from probing the resulting name set via a unique corpus comprising over 3.3B Domain Name System (DNS) records. We find that over 2.3M potential typosquatting names have been registered that resolve to an IP address. We then assess those names using a framework focused on identifying the intent of the domain from the perspectives of DNS and webpage clustering. Using the DNS information, HTTP responses, and Google SafeBrowsing, we classify the candidate typosquatting names as resolved to private IP, malicious, defensive, parked, legitimate, or unknown intents. Our findings provide the largest-scale and most-comprehensive perspective to date on typosquatting, exposing potential risks to users. Further, our methodology provides a blueprint for tracking and classifying typosquatting on an ongoing basis.
more » « less
Free, publicly-accessible full text available May 26, 2026
Squatspotting: Towards the Systematic Measurement of Typosquatting Techniques

Wung, Wei-Shiang; Kranig, Calvin; Pauley, Eric; Barford, Paul; Crovella, Mark; Sommers, Joel (May 2025, IFIP Networking)

Free, publicly-accessible full text available May 7, 2026
An Elemental Decomposition of DNS Name-to-IP Graphs

https://doi.org/10.1109/INFOCOM52122.2024.10621147

Anderson, Alex; Mondal, Aadi Swadipto; Barford, Paul; Crovella, Mark; Sommers, Joel (May 2024, IEEE)

The Domain Name System (DNS) is a critical piece of Internet infrastructure with remarkably complex properties and uses, and accordingly has been extensively studied. In this study we contribute to that body of work by organizing and analyzing records maintained within the DNS as a bipartite graph. We find that relating names and addresses in this way uncovers a surprisingly rich structure. In order to characterize that structure, we introduce a new graph decomposition for DNS name-to-IP mappings, which we term elemental decomposition. In particular, we argue that (approximately) decomposing this graph into bicliques — maximally connected components — exposes this rich structure. We utilize large-scale censuses of the DNS to investigate the characteristics of the resulting decomposition, and illustrate how the exposed structure sheds new light on a number of questions about how the DNS is used in practice and suggests several new directions for future research.
more » « less
Full Text Available
A Manifold View of Connectivity in the Private Backbone Networks of Hyperscalers

https://doi.org/10.1145/3604620

Salamatian, Loqman; Anderson, Scott; Mathews, Joshua; Barford, Paul; Willinger, Walter; Crovella, Mark (August 2023, Communications of the ACM)

As hyperscalers such as Google, Microsoft, and Amazon play an increasingly important role in today's Internet, they are also capable of manipulating probe packets that traverse their privately owned and operated backbones. As a result, standard traceroute-based measurement techniques are no longer a reliable means for assessing network connectivity in these global-scale cloud provider infrastructures. In response to these developments, we present a new empirical approach for elucidating connectivity in these private backbone networks. Our approach relies on using only lightweight (i.e., simple, easily interpretable, and readily available) measurements, but requires applying heavyweight mathematical techniques for analyzing these measurements. In particular, we describe a new method that uses network latency measurements and relies on concepts from Riemannian geometry (i.e., Ricci curvature) to assess the characteristics of the connectivity fabric of a given network infrastructure. We complement this method with a visualization tool that generates a novel manifold view of a network's delay space. We demonstrate our approach by utilizing latency measurements from available vantage points and virtual machines running in datacenters of three large cloud providers to study different aspects of connectivity in their private backbones and show how our generated manifold views enable us to expose and visualize critical aspects of this connectivity.
more » « less
Full Text Available
Characterizing Covid Waves via Spatio-Temporal Decomposition

https://doi.org/10.1145/3534678.3539136

Quinn, Kevin; Terzi, Evimaria; Crovella, Mark (August 2022, ACM SIGKDD)

Full Text Available

« Prev Next »

Search for: All records